In this practical you will analyse microarray data using R. The data files provided are data files that already underwent image analysis and can be downloaded from Canvas (Affy_data.zip). This is rat genomic data from Affymetrix GeneChips, a very popular platform for measuring genome-wide expression levels. In this experiment, some rats receive a treatment while some others are used as a control. The downloaded files include 6 CEL files, which are text files containing the information from each microarray experiment, and a “experimentdescription.txt” text file containing the description of each microarray (which CEL files corresponds to rats that received treatment and which correspond to the control group). Feel free to open and explore them. Note that in this case there is only one condition, but in more complicated experiments more than one column can be included in the descriptions file. The aim of this practical is to find genes that are differentially expressed in rats which received a treatment and find relevant information or annotate them. For this purpose, you will use Bioconductor packages. Bioconductor is a widely used open source software for Bioinformatics that provides tools for biological data analysis using R. Some of the packages needed for this practical are specific for Affymetrix (“affy”, “affyPLM”, “annaffy” and “rgu34a.db”) and the package “limma” is used for statistical analysis (not specific to Affymetrix).
To check if these packages are installed you can try to load
Check dependencies for practical
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(c("affy","affyPLM", "limma", "annaffy", "rgu34a.db"))
## Bioconductor version 3.20 (BiocManager 1.30.25), R 4.4.1 (2024-06-14)
## Warning: package(s) not installed when version(s) same as or greater than current; use
## `force = TRUE` to re-install: 'affy' 'affyPLM' 'limma' 'annaffy' 'rgu34a.db'
Bioconductor packages are very well documented. You can access their documentation using i.e. browseVignettes(“limma”) or to find more information about the usage of a specific function from any package you can use i.e. ?boxplot.
To start working, set the working directory were the files are located using setwd() i.e. setwd(“Affy_Data”) and it is a good practice to clear the environment and the graphics area using rm(list=ls()) and graphics.off(). Now you are ready to start the analysis. As a quick guideline, the procedure will consist on:
The first thing to be done is to load the microarray data. For this purpose, the affy package is used:
library(affy)
## Loading required package: BiocGenerics
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## anyDuplicated, aperm, append, as.data.frame, basename, cbind,
## colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
## get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
## match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
## Position, rank, rbind, Reduce, rownames, sapply, saveRDS, setdiff,
## table, tapply, union, unique, unsplit, which.max, which.min
## Loading required package: Biobase
## Welcome to Bioconductor
##
## Vignettes contain introductory material; view with
## 'browseVignettes()'. To cite Bioconductor, see
## 'citation("Biobase")', and for packages 'citation("pkgname")'.
Affy.Expt = read.AnnotatedDataFrame("Affy_Data/Affy_Data/experimentdescription.txt", header=TRUE, row.names=1, sep="\t")
Affy.Data = ReadAffy(filenames=paste0('Affy_Data/Affy_Data/',rownames(pData(Affy.Expt))), phenoData=Affy.Expt,verbose=TRUE)
## 1 reading Affy_Data/Affy_Data/c1.cel ...instantiating an AffyBatch (intensity a 285156x6 matrix)...done.
## Reading in : Affy_Data/Affy_Data/c1.cel
## Reading in : Affy_Data/Affy_Data/c2.cel
## Reading in : Affy_Data/Affy_Data/c3.cel
## Reading in : Affy_Data/Affy_Data/t1.cel
## Reading in : Affy_Data/Affy_Data/t2.cel
## Reading in : Affy_Data/Affy_Data/t3.cel
class(Affy.Expt)
## [1] "AnnotatedDataFrame"
## attr(,"package")
## [1] "Biobase"
class(Affy.Data)
## [1] "AffyBatch"
## attr(,"package")
## [1] "affy"
Affy.Expt is an AnnotatedDataFrame, an object from Biobase (the base functions from Bioconductor), and Affy.Data is an AffyBatch object from the affy package. You can explore what is inside this objects using the function attributes().
attributes(Affy.Data)
## $.__classVersion__
## R Biobase eSet AffyBatch
## "4.4.1" "2.66.0" "1.3.0" "1.2.0"
##
## $cdfName
## [1] "RG_U34A"
##
## $nrow
## Rows
## 534
##
## $ncol
## Cols
## 534
##
## $assayData
## <environment: 0x117530f30>
##
## $phenoData
## An object of class 'AnnotatedDataFrame'
## sampleNames: c1.cel c2.cel ... t3.cel (6 total)
## varLabels: Treatment
## varMetadata: labelDescription
##
## $featureData
## An object of class 'AnnotatedDataFrame': none
##
## $experimentData
## Experiment data
## Experimenter name:
## Laboratory:
## Contact information:
## Title:
## URL:
## PMIDs:
## No abstract available.
## Information is available on: preprocessing
## notes:
## :
##
##
## $annotation
## [1] "rgu34a"
##
## $protocolData
## An object of class 'AnnotatedDataFrame'
## sampleNames: c1.cel c2.cel ... t3.cel (6 total)
## varLabels: ScanDate
## varMetadata: labelDescription
##
## $class
## [1] "AffyBatch"
## attr(,"package")
## [1] "affy"
names(attributes(Affy.Data))
## [1] ".__classVersion__" "cdfName" "nrow"
## [4] "ncol" "assayData" "phenoData"
## [7] "featureData" "experimentData" "annotation"
## [10] "protocolData" "class"
attributes(Affy.Data)$annotation
## [1] "rgu34a"
Here (^^) you found the source for annotation, which will be carried out later. Also check the size of the dimensions of the microarray and see the first lines of the expression data. You can see the whole matrix typing View(exprs(Affy.Data)).
dim(Affy.Data)
## Rows Cols
## 534 534
head(exprs(Affy.Data)) # First line of expression data in Affy.Data
## c1.cel c2.cel c3.cel t1.cel t2.cel t3.cel
## 1 110.5 115.0 116.5 124.8 134.5 204.3
## 2 10359.3 10697.5 11334.0 10685.5 9573.5 16397.8
## 3 117.3 139.3 125.0 136.8 160.5 145.0
## 4 10578.5 10829.8 11281.3 10888.8 10032.3 16954.3
## 5 59.0 71.0 76.3 69.5 77.3 75.0
## 6 109.5 106.0 115.3 95.0 138.3 123.0
Now, you may want to generate some plots to explore your data. The generated plots will be displayed on the graphics area on RStudio. Alternatively, you may want to display the plots in a separated window by typing x11() or save it into a file using png(), tiff(), pdf()… In order to check how they work and the input parameters they take you can use ?png, ?tiff or ?pdf. This information is useful because if your figure is too large for the RStudio graphics area, then you will get a “Error in plot.new() : figure margins too large” error and you will need an alternative solution.
You can generate boxplots and histograms to see the distribution of the data. You will plot controls in green and treatments in red.
col = c(rep('green'), 3, rep('red', 3))
boxplot(Affy.Data, col=col, main= 'Boxplot of raw data')
## Warning: replacing previous import 'AnnotationDbi::tail' by 'utils::tail' when
## loading 'rgu34acdf'
## Warning: replacing previous import 'AnnotationDbi::head' by 'utils::head' when
## loading 'rgu34acdf'
##
#
hist(Affy.Data,type="l", col=col,main = "Histogram of raw data")
#
You can plot the chip images for each one of the arrays at the probe level and a RNA degradation plot. Use the help and check the documentation to find more information about the functions used to generate the plots. Check the function sapply as well, as it is not strictly related to microarray analysis but it comes in handy when dealing with vectors in R.
par(mfrow=c(2,3))
chip = sapply(1:ncol(exprs(Affy.Data)),
function(x) image(Affy.Data[,x]))
RNA.deg = AffyRNAdeg(Affy.Data)
par(mfrow=c(1,1))
plotAffyRNAdeg(RNA.deg)
par(mfrow=c(1,1))
Pre-processing for microarray data consists of three steps: background correction to remove the effect of unspecific binding, normalization between arrays and summarization to combine all the information from the multiple probes in the same array (genes are represented more than once in the same array). You will perform preprocessing using the threestep() function from the affyPLM package.
library(affyPLM)
## Loading required package: gcrma
## Loading required package: preprocessCore
norm.rma2.quantile <- threestep(Affy.Data, background.method="RMA.2",
normalize.method="quantile",summary.method="median.polish")
norm.masim.quantile <- threestep(Affy.Data, background.method="MASIM",
normalize.method="quantile",summary.method="median.polish")
norm.rma2.scaling <- threestep(Affy.Data, background.method="RMA.2",
normalize.method="scaling",summary.method="median.polish")
norm.masim.scaling <- threestep(Affy.Data, background.method="MASIM",
normalize.method="scaling",summary.method="median.polish")
Then boxplots and histograms are useful to know which methods are better for our data.
# Compare boxplots
par(mfrow=c(2,3))
boxplot(Affy.Data, col= col, main = "Raw")
boxplot(norm.rma2.quantile, las=3, cex.axis=0.5, pch=".",
col=col,main = "RMA.2+quantile")
boxplot(norm.masim.quantile, las=3, cex.axis=0.5, pch=".",
col=col,main = "MASIM+quantile")
boxplot(norm.rma2.scaling, las=3, cex.axis=0.5, pch=".",
col=col,main = "RMA.2+scaling")
boxplot(norm.masim.scaling, las=3, cex.axis=0.5, pch=".",
col=col,main = "MASIM+scaling")
par(mfrow=c(1,1))
# Compare histograms
par(mfrow=c(2,3))
hist(Affy.Data, col=col,main = "Raw")
hist(norm.rma2.quantile, las=3, cex.axis=0.5, pch=".",
col=col,main = "RMA.2+quantile")
hist(norm.masim.quantile, las=3, cex.axis=0.5, pch=".",
col=col,main = "MASIM+quantile")
hist(norm.rma2.scaling, las=3, cex.axis=0.5, pch=".",
col=col,main = "RMA.2+scaling")
hist(norm.masim.scaling, las=3, cex.axis=0.5, pch=".",
col=col,main = "MASIM+scaling")
par(mfrow=c(1,1))
#